460 research outputs found
Combining Spectral Representations for Large Vocabulary Continuous Speech Recognition
In this paper we investigate the combination of complementary acoustic feature streams in large vocabulary continuous speech recognition (LVCSR). We have explored the use of acoustic features obtained using a pitch-synchronous analysis, STRAIGHT, in combination with conventional features such as mel frequency cepstral coefficients. Pitch-synchronous acoustic features are of particular interest when used with vocal tract length normalisation (VTLN) which is known to be affected by the fundamental frequency. We have combined these spectral representations directly at the acoustic feature level using heteroscedastic linear discriminant analysis (HLDA) and at the system level using ROVER. We evaluated this approach on three LVCSR tasks: dictated newspaper text (WSJCAM0), conversational telephone speech (CTS), and multiparty meeting transcription. The CTS and meeting transcription experiments were both evaluated using standard NIST test sets and evaluation protocols. Our results indicate that combining conventional and pitch-synchronous acoustic feature sets using HLDA results in a consistent, significant decrease in word error rate across all three tasks. Combining at the system level using ROVER resulted in a further significant decrease in word error rate
Pitch adaptive features for LVCSR
We have investigated the use of a pitch adaptive spectral representation on large vocabulary speech recognition, in conjunction with speaker normalisation techniques. We have compared the effect of a smoothed spectrogram to the pitch adaptive spectral analysis by decoupling these two components of STRAIGHT. Experiments performed on a large vocabulary meeting speech recognition task highlight the importance of combining a pitch adaptive spectral representation with a conventional fixed window spectral analysis. We found evidence that STRAIGHT pitch adaptive features are more speaker independent than conventional MFCCs without pitch adaptation, thus they also provide better performances when combined using feature combination techniques such as Heteroscedastic Linear Discriminant Analysis
Industrial Landscapes Between Environmental Sustainability and Landscape Constraints: The Case Study of Euralluminia in the Sulcis Area of Sardinia (Italy)
In Italy, industrialization had a remarkable development in the 1950s and 1960s, and aimed with priority of ensuring economic growth and development. The location of the industrial complexes was determined by the dynamics of the production that required a territory equipped to supply specific infrastructures such as water connections, sewers, gas pipelines and the electricity grid, and above all areas where to build transport terminals capable of mitigating the costs of handling the product. This led Italy to locate industrial activities in many coastal sites, close to pre-existing urban contexts, resulting in a well-defined coastal industrial landscape especially in the areas of Southern Italy that were chosen as centers of development. Today, the determining factor for location choices is the cost of the workforce and this has made more and more frequent the processes of delocalization of the companies with worrying repercussions both for the direct and induced occupation and for the degradation of the landscape. This process, linked to the safety regulations, to the updating of the systems and to an increasingly more rigorous landscape legislation, makes critical the framework of the existing and not yet abandoned disused industrial realities. For these reasons, the main objective of this article is to evaluate the compatibility between existing industrial areas at risk of delocalization and new interpretations of the environment and the landscape to be reconstituted, in order to allow the realization of goods that maintain the levels of industrial production within a framework ofecological protection rules and recently adopted landscape constraints. In this regard, in this paper the authors use the Eurallumina industry in Sulcis in Sardinia (Italy) as a case study, in order to analyze the problem that concerns the uses in the territories with an industrial vocation and the landscape components, that deserve particular attention to safeguard not only for the economic and social context but also for the quality of the coastal environment. The case study is particularly significant because the Euralluminia industry for some years was at risk of delocalization because it needs of a conversion of some parts of the plants, blocked due to the landscape regulation imposed by the Superintendence of Cultural Heritage ofSouthern Sardinia for the expected changes in the coastal environment. Therefore, keeping in mind the theories of localization and the pushes for the delocalization of the industrial contexts, the study discusses the importance of the interconnection between economic and landscape factors paying particular attention to the coastal areas
Speaker normalisation for large vocabulary multiparty conversational speech recognition
One of the main problems faced by automatic speech recognition is the variability of
the testing conditions. This is due both to the acoustic conditions (different transmission
channels, recording devices, noises etc.) and to the variability of speech
across different speakers (i.e. due to different accents, coarticulation of phonemes
and different vocal tract characteristics). Vocal tract length normalisation (VTLN)
aims at normalising the acoustic signal, making it independent from the vocal tract
length. This is done by a speaker specific warping of the frequency axis parameterised
through a warping factor. In this thesis the application of VTLN to multiparty
conversational speech was investigated focusing on the meeting domain. This
is a challenging task showing a great variability of the speech acoustics both across
different speakers and across time for a given speaker. VTL, the distance between
the lips and the glottis, varies over time. We observed that the warping factors estimated
using Maximum Likelihood seem to be context dependent: appearing to be
influenced by the current conversational partner and being correlated with the behaviour
of formant positions and the pitch. This is because VTL also influences the
frequency of vibration of the vocal cords and thus the pitch. In this thesis we also
investigated pitch-adaptive acoustic features with the goal of further improving the
speaker normalisation provided by VTLN.
We explored the use of acoustic features obtained using a pitch-adaptive analysis
in combination with conventional features such as Mel frequency cepstral coefficients.
These spectral representations were combined both at the acoustic feature
level using heteroscedastic linear discriminant analysis (HLDA), and at the system
level using ROVER. We evaluated this approach on a challenging large vocabulary
speech recognition task: multiparty meeting transcription. We found that VTLN
benefits the most from pitch-adaptive features. Our experiments also suggested that
combining conventional and pitch-adaptive acoustic features using HLDA results in
a consistent, significant decrease in the word error rate across all the tasks. Combining
at the system level using ROVER resulted in a further significant improvement.
Further experiments compared the use of pitch adaptive spectral representation with
the adoption of a smoothed spectrogram for the extraction of cepstral coefficients.
It was found that pitch adaptive spectral analysis, providing a representation which
is less affected by pitch artefacts (especially for high pitched speakers), delivers features with an improved speaker independence. Furthermore this has also shown to
be advantageous when HLDA is applied. The combination of a pitch adaptive spectral
representation and VTLN based speaker normalisation in the context of LVCSR
for multiparty conversational speech led to more speaker independent acoustic models
improving the overall recognition performances
Applying Vocal Tract Length Normalization to Meeting Recordings
Vocal Tract Length Normalisation (VTLN) is a commonly used
technique to normalise for inter-speaker variability. It is based
on the speaker-specific warping of the frequency axis, parameterised
by a scalar warp factor. This factor is typically estimated
using maximum likelihood. We discuss how VTLN may
be applied to multiparty conversations, reporting a substantial
decrease in word error rate in experiments using the ICSI meetings
corpus. We investigate the behaviour of the VTLN warping
factor and show that a stable estimate is not obtained. Instead it
appears to be influenced by the context of the meeting, in particular
the current conversational partner. These results are consistent
with predictions made by the psycholinguistic interactive
alignment account of dialogue, when applied at the acoustic and
phonological levels
Floor Holder Detection and End of Speaker Turn Prediction in Meetings
We propose a novel fully automatic framework to detect which meeting participant is currently holding the conversational floor and when the current speaker turn is going to finish. Two sets of experiments were conducted on a large collection of multiparty conversations: the AMI meeting corpus. Unsupervised speaker turn detection was performed by post-processing the speaker diarization and the speech activity detection outputs. A supervised end-of-speaker-turn prediction framework, based on Dynamic Bayesian Networks and automatically extracted multimodal features (related to prosody, overlapping speech, and visual motion), was also investigated. These novel approaches resulted in good floor holder detection rates (13:2% Floor Error Rate), attaining state of the art end-of-speaker-turn prediction performances
Grafting of the 2,8-dithia-5-aza-2,6-pyridinophane macrocycle on SBA-15 mesoporous silica for the removal of Cu2+ and Cd2+ ions from aqueous solutions: synthesis, adsorption, and complex stability studies
Silica-based mesoporous materials have received growing attention in metal recovery from industrial processes, although, in general, the adsorption of metal ions by silanols is rather poor. Nevertheless, a great improvement of metal ion removal from aqueous solutions can be achieved by grafting metal-chelators on the particles’ surface. Combining the metal-chelating properties of organic ligands with the high surface area of mesoporous silica particles makes these hybrid nanostructured materials a new horizon in metal recovery, sensing and controlled storage of metal ions in industrial and mining processes. Here, the 2,8-dithia-5-aza-2,6-pyridinophane (L) macrocycle was grafted on SBA-15 mesoporous silica to obtain the SBA-L mesoporous adsorbent for the removal and controlled recovery of Cd2+ and Cu2+ ions from aqueous solution in a broad pH range (4-11). By grafting about 0.3 mmol g−1 of L on SBA-15 a maximum loading capacity of 20.9 mg g−1 and 31.8 mg g−1 was obtained for Cu2+ and Cd2+, respectively. The adsorption kinetics can be described with the pseudo-second order model, while the adsorption isotherm (298 K) followed the Langmuir model. The latter, together with potentiometric studies, suggests that the adsorption mechanism is based on metal chelation by the grafted macrocycle. In summary, SBA-L is an effective copper(ii) and cadmium(ii) chelator for possible applications where metal removal, storage and recovery are of basic importance
Predictive and motivational factors influencing anticipatory contrast: A comparison of contextual and gustatory predictors in food restricted and free-fed rats
In anticipation of palatable food, rats can learn to restrict consumption of a less rewarding food type resulting in an increased consumption of the preferred food when it is made available. This construct is known as anticipatory negative contrast (ANC) and can help elucidate the processes that underlie binge-like behavior as well as self-control in rodent motivation models. In the current investigation we aimed to shed light on the ability of distinct predictors of a preferred food choice to generate contrast effects and the motivational processes that underlie this behavior. Using a novel set of rewarding solutions, we directly compared contextual and gustatory ANC predictors in both food restricted and free-fed Sprague-Dawley rats. Our results indicate that, despite being food restricted, rats are selective in their eating behavior and show strong contextually-driven ANC similar to free-fed animals. These differences mirrored changes in palatability for the less preferred solution across the different sessions as measured by lick microstructure analysis. In contrast to previous research, predictive cues in both food restricted and free-fed rats were sufficient for ANC to develop although flavor-driven ANC did not relate to a corresponding change in lick patterning. These differences in the lick microstructure between context- and flavor-driven ANC indicate that the motivational processes underlying ANC generated by the two predictor types are distinct. Moreover, an increase in premature port entries to the unavailable sipper – a second measure of ANC – in all groups reveals a direct influence of response competition on ANC development
Transcription of conference room meetings: an investigation
The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. In this paper we explore the use of various meeting corpora for the purpose of automatic speech recognition. In particular we investigate the similarity of these resources and how to efficiently use them in the construction of a meeting transcription system. The analysis shows distinctive features for each resource. However the benefit in pooling data and hence the similarity seems sufficient to speak of a generic conference meeting domain . In this context this paper also presents work on development for the AMI meeting transcription system, a joint effort by seven sites working on the AMI (augmented multi-party interaction) project
- …